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Abstract 

This paper studies the MINLIP estimator for the identification 
of Wiener systems consisting of a sequence of a linear FIR dy- 
namical model, and a monotonically increasing (or decreasing) 
static function. Given T observations, this algorithm boils down 
to solving a convex quadratic program with 0{T) variables 
and inequality constraints, implementing an inference technique 
which is based entirely on model complexity controj] The re- 
sulting estimates of the linear submodel are found to be almost 
consistent when no noise is present in the data, under a condi- 
tion of smoothness of the true nonlinearity and local Persistency 
of Excitation (local PE) of the data. This result is novel as it 
does not rely on classical tools as a 'linearization' using a Tay- 
lor decomposition, nor exploits stochastic properties of the data. 
It is indicated how to extend the method to cope with noisy data, 
and empirical evidence contrasts performance of the estimator 
against other recently proposed techniques. 



1 INTRODUCTION 

The identification of Wiener systems has been consid- 
ered in many papers since the 1970s. Different exist- 
ing approaches could roughly be divided in methods us- 
ing (i) Invertible nonlinearities (reducing to Hammerstein 
identification), (ii) correlation based approaches exploit- 
ing stochastic properties of the signals |[3] [H, (iii) ap- 
proximate (recursive) PEM approaches providing a well- 
established framework for convergence analysis lfT6l [iTl 

' This is a rewritten, refined and extended version of the CDC 2010 
submission 'On the Identification of Monotone Wiener Systems'. 



and |l6l, (iv) subspace based approaches ifTSll . For a gen- 
eral overview see the survey (5). Specific applications 
towards identification with quantized outputs are consid- 
ered in ifTSl . see also the book lfT2ll . The present approach 
builds further on ideas developed in fT9"2l. Another rele- 
vant work is [14 1 which considers a similar identification 
task as we will do. The MINLIP estimator (or shortly 
MINLIP, we will clarify the abbreviation shortly) studied 
in this paper for identification of dynamic Wiener systems 
was originally conceived in the context of learning rank- 
ing functions and survival analysis, see fT3\ and earlier 
work of those authors. In [9], the authors studied the im- 
pact of model complexity control for the identification of 
Hammerstein systems. In fTOl the use of explicit com- 
plexity control was investigated in the context of adaptive 
filtering. 

While the literature on the identification of Wiener sys- 
tems is considerable, often theoretical understanding of 
the proposed techniques is restiicted to exposition of an 
appropriate technical implementation. Notable excep- 
tions are given in ifTTl . f5l| and |2|. The first work con- 
siders a Recursive Prediction Error Method (RPEM) of 
general Wiener models, and convergence properties are 
derived using the ODE framework as in I^TJ. This ap- 
proach makes considerable assumptions on the stochastic 
mechanisms underlying the signals used for (recursive) 
identification. The smoothing approach described in ISJ 
exploits as well a stochastic assumption of the involved 
signals, and asserts basically that the Wiener system can 
be identified by directly averaging out the nonlinear ef- 
fect. Although powerful concentration inequalities lie on 
the basis of this approach, no argument is given that this 



method applies for Wiener systems which are more com- 
plex (realistic) than the academic examples presented in 
those papers. 

This work was prompted by the earlier |19|, explor- 
ing the task of identification of monotone wiener systems. 
The practical algorithm proposed in that paper will always 
yield a trivial solutions (h — Od in their notation), and is 
as such to be depreciated. The line of thinking however 
looks powerful, and this led Q to investigate the ques- 
tion under what conditions on the static nonlinearity (be- 
sides monotonicity) a Wiener system is identifiable. The 
present work takes this results a step further, introduc- 
ing model complexity control into the picture. Specifi- 
cally, we express model complexity in terms of a Lips- 
chitz property of the estimated nonlinearity. The idea of 
minimizing (MIN-) this Lipschitz property (-LIP) results 
directly in an efficient identification technique termed 
the MINLIP estimator - or shortly MINLIP - which can 
be solved efficiently using tools of convex optimization. 
Specifically, we rephrase the identification task as a con- 
vex Quadratic Programming (QP) problem of 0{T) un- 
knowns and inequalities (where T denotes the number 
of samples). We as well present an analysis that the 
estimates given by this algorithm are almost consistent, 
where the approximation factor relies on the richness of 
the data in terms of a local measure of persistency of exci- 
tation, and the smoothness of the static nonlinearity. This 
analysis does not resort to local approximations using a 
Taylor decomposition, and does not need any stochastic 
setup. 

The contribution of this paper is threefold. Section 2 
describes the precise class of monotone Wiener systems 
which is envisaged, and discusses the main ideas motivat- 
ing the MINLIP estimator An artificial yet challenging 
case study provides empirical evidence for this estimator 
for noiseless data. Section 3 then establishes almost con- 
sistency of the estimates under appropriate conditions of 
the data used for identification, and the underlying model. 
This result is non-trivial as the considered model does not 
allow for a straightforward (finite) parametrization, and is 
not based on minimizing a model mismatch criterion as 
classical. In particular, we need to have that the data is 
locally Persistent Exciting (PE) while the true monotone 
nonlinearity needs to be smooth around its steepest part. 

Section 3 describes an extension towards the case 
where (i) data is perturbed by noise, (ii) the true system 
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Fig. 1 : Schematic representation of Wiener systems un- 
der consideration. The function /q : M — > M is 
assumed to be monotonically in- or decreasing. 
Neither Hq nor /q is assumed to be invertible. 



does not belong to the considered model class of mono- 
tone Wiener systems of given order, or (iii) where no true 
model is assumed to exist. Again, the MINLIP estimates 
are given by solving a convex Quadratic Program with 
0{T) unknowns and inequality constraints. Empirical ev- 
idence is given for the use of this estimator, and the degra- 
dation of the accuracy of the estimate in case of noise is 
investigated experimentally. Section 5 concludes the pa- 
per, and highlights a number of open research questions. 

2 Identification of Monotone Wiener 
Systems 

2.1 The Model Class 

This work focus on the identification of nonlinear dy- 
namic models in the following model class. 

Definition 1 (FIR Wiener Model (/, a)) A Wiener 
model consists of a sequence of ( i) a linear dynamical 
model characterized by an impulse response function 
H{q~^) (here is the backshift operator as classi- 
cally) applied on the input signal {ut}t, and (ii) a static 
nonlinear function / : ]R — M (see Fig. 1). If the 
signals {uf}t and {yt}t follow such a model with 'true' 
subsy stern Hq and 'true' function /p, we can write 



yt = fo{Ho{q-'){ut) 



(1) 



and we say that the observations come from the Wiener 
system {Hq, /q). For a FIR-Wiener model of order d, one 
considers a Finite Impulse Response (FIR) parametriza- 
tion of the linear subsystem, or H{q^^) = aiq"^ + • • • + 
adq^'^. We denote such model (in the context of this pa- 
per) shortly as the (/, a)-Wiener model. Now since a and 



the domain of f can be rescaled arbitrarily, it is conve- 
nient to impose \\a\\2 — 1, which does avoid identifiabilty 
issues. If the signals {ut}t and {yt}t obey such a model 
with 'true' function /o and 'true' parameters Aq, or 
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- where ag G M and ||ao||2 = 1 <^nd we define = 
(wt-i, . . . , Ut-d)^ G H^"* - we say that the observations 
come from the Wiener system (ag, /o). We will denote the 
set of possible observations as S — {(u^, j/f)}i C M'^xM. 

We specialize further to a subset of this class as follows, 
schematically illustrated in fig. ([T]i. 

Definition 2 (Monotone FIR Wiener Model (/, a)) 

A FIR-Wiener model (/, a) is called monotone ;/ 
/ : M ^ M /i monotonically increasing (but not 
necessarily invertible). 



yt = /(a^ut). 



(3) 



We define the Monotone Wiener model class formally as 



Monotonically increasing^ 
aeMM|a||2 = l}. (4) 



Note that by similarity one has (/, a) = (/', —a), where 
f{z) = f'{—z) for all z e M. Now /' is monotoni- 
cally decreasing, explaining why we can omit the denom- 
inator 'increasing' in the nomenclature. As argued be- 
fore, this class of monotone FIR-Wiener model can cap- 
ture such different effects as (1) quantized output mea- 
surements, (2) saturation effects of the sensor, and (3) 
handling of general bijective transformations of the out- 
put scaling (cfr the temperature scale of Celsius versus 
Fahrenheit), amongst others. 

2.2 Identification by MINLIP 

The identification technique implements the adagio 'make 
problems as simple as possible, but not simpler' . The sur- 
prising result is that this idea may yield consistent esti- 
mates, without any reference to notions as 'statistical like- 
lihood' or 'prediction error'. 



The problem of identification of a Wiener system from 
observations is traditionally formalized as follows 



a,/ 



min J(/,a) = ^ (/(a^u*) 

: k 2 = 1 . . 
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(5) 



t=d+l 



We refer to this formulation of the estimation problem as 
to a prediction error method for Wiener models - abbrevi- 
ated here as WPEM - and it will be mainly this approach 
we will contrast the proposed method against. Note that 
this approach in case of noise as in Section III is a merely 
an Output-Error (OE) technique, unless stringent stochas- 
tic assumptions can be made on the noise, see e.g. ||6l 
for a discussion. In general,this formulation is hard to 
solve as the unknowns a interact directly with the un- 
known function /. As a result one typically resorts to an 
iterative scheme or general purpose nonlinear optimiza- 
tion routine. The practical procedures lack generality and 
robustness for different reasons (i) depending on the form 
(or parametrization) of /, ill-conditioning of the problem 
may arise or even gradient information may not exist, (ii) 
the problem can often be stuck in local minima, which 
can be arbitrary bad (iii) procedures are highly depending 
on the exact representation of the unknown /. In general, 
such procedures are therefore not easily scalable to more 
complex settings. 

The approach we will advocate in this paper is however 
conceptually quite different. Rather than minimizing the 
equation errors, we look for the least complex model re- 
constructing the observations. What we mean by 'least 
complex model' is somehow up to the user to decide. In 
this paper we consider a specific complexity measure de- 
fined as follows which will eventually reduce the infer- 
ence problem to an optimization problem which can be 
solved efficiently, and for which we prove consistency. 

Definition 3 (Lipshitz Condition) Consider a 
function/ : M — > M. Assume there exists a constant L 
such that one has for all z, z' £ M.'^ that 

\f{z)-f{z')\<L\z-z'\, (6) 

then f is Lipschitz smooth with a constant L. 

Now, we structure the model class by the following nested 



sets 



G / : Lipschitz with constant L, \\a\\2 = 

(7) 

Now if Xi < L2 < ■ ■ ■ < Lfc are /c sorted constants, 
then one has a nested structure over the model class, i.e. 



^Li c j-i, c . . . c J^L, c T. 



(8) 



A plausible identification algorithm would now be to find 
the linear parameters a G such that the mapping from 
{a^Ut]t to the corresponding values {yt]t has as small a 
Lipschitz value L as possible. Since we are only interested 
at this stage in the parameters a rather than also recovering 
/, we focus attention on the given samples only. Conse- since zi 
quently, sufficient condition for our needs for a function 
/ to be Lipschitz smooth with constant L is 

|/(a^Ui) - /(a^Uj)| < L {n, ~ Uj)| <j = d,. 

(9) 

By exploiting the monotonicity property of /, one can 
write the 0{T^) constraints equivalently using only 0{T) 
constraints by ordering the data. This step captures the 
function / implicitly (see Figure (j2|), proven as follows. 

Lemma 1 (Existence of a Transformation Function) 

Given a collection of pairs y(i))}"^]^, enumerated 

such that y(^i-j < mj-^ if and only if i < j. Then we 
consider the sample conditions for L < 00: 



< {y(j) - y(.)) < L (z(^) - Vi < j 



1. 



(10) 



7. If one has for a finite L > that holds, then 
there exist a monotonically increasing function f : 
M — !■ M with Lipschitz constant L interpolating the 
samples. 

2. If one has that for all admissible {z,y) G M x M 
one has that y = f{z)for an (unknown) continuous, 
(finite) differentiable and monotonically increasing 
function / : M — > M, then there is an L < 00 such 
that dTOl holds. 



Proof: To proof 1, consider the linear interpolation func- 
tion /„ : M — > M, defined as 



fn{z) 



ziz) 



z{z) 



{yz{z)~yz{z)) +yz(z), (H) 



where we define z{z) = argminj(zi '■ Zi > z) and 
ziz) — argmaxi(zi : Zi < z). Direct manipulation 
iljows that this function is monotonically increasing and 
continuous. Now take z < z' G M, then we have to show 
that /„(z') — /„(z) < L{z' — z). For convenience of nota- 
tion define I = z{z), u — z{z), I' = z{z') and u' — z{z'), 
then 



z' — Zli 



Z — Zl 

■{yw-yi') — [yu - yi) + {yi' - yi) 

< L{z' - Zl,) - L{z - Zl) + L{zi> - Zl) 

= L{z'-z), (12) 

< Zl, and yi < yi, by definition. 
Item 2 is proven as follows. Let /' be the derivative of 

a differentiable function /q, then the mean value theorem 
asserts that for any two samples (z^, j/j) and (zj, yj) for 
wiuch Zi < Zj, there exists a z G (z^, Zj) C M such that 

{yj - yi) = {zj - Zi)f'{z) < L{zi - Zj) (13) 

where L — sup^ f'(z). This observation motivates the 
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Fig. 2: 



Schematic representation of Lemma 1. If a 
Lipschitz-smooth monotone f exists (black dash- 
dotted curved line), samples obey the pairwise 
Lipschitz constraint. If samples exist satisfying 
the Lipschitz constraints, one can always find a 
monotone function interpolating this samples (in- 
dicated by the yellow blocks). 



following procedure: find parameters a such that / has 



minimal Lipschitz condition, 
solving 



The solution is given by 



|T d+i jg ordered vector and 
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i,...,r, (14) 
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G {-1,0,1} 



(T-d)x(T-d+l) 



(18) 



or 



mill i 



S.t. 



klb = 1, 



{V(i)-y(t-i)) < L (a'^(u(i) - U(i_i))) , Vz = . 



where we have only a linear number of constraints. After 
a change of variable, we can write equivalently 

Definition 4 (MINLIP) Given an ordered set of exam- 
ples {(u(i), C X M indexed such that 
y{i~i) ^ y(i)for all i — d+l, . . . ,T, then our (rescaled) 
estimate ax follows by solving 

T 

qt — argmm a a s.t. 

a 

(y(i)-y(j-i)) < a^(u(j)-U(j_i)), Vi = d+1, ...,T. 

(16) 

where the estimated function f is specified only implicitly 
as in Proposition 1. 

This problem can be cast as a convex Quadratic Program 
(QP) with T — d linear constraints and d unknowns. This 
problem can be solved efficiently with contemporarily 
solvers available in most mathematical packageq^ The 
aT which minimizes this constrained objective is our es- 
timate of a (rescaled) version of the parameters of the FIR 
system H{q). The problem is written in matrix notation 
as 

aT — argmin a"^a s.t. (Au)a < Ay, (17) 

where u = (u(rf), . . . , U(t)) e ^(T-d+i)xd ^ ^iw- 
kel matrix (up to the sorting!), y = (yj^), . . . , y^T)) e 

^ In our experiments we use the solver available at 
|http://www.mosek.org| 



In order to improve reproducibility of the result and stress 
practical use, a full MATLAB implementation in 8 lines 
of code is given in Alg. ([T]l. In order to extend this imple- 
mentation to handle general cases one should take addi- 
ytional care of (possible) ties of output samples (See Sub- 
' ^^gytion IV.C for a practical way to cope with this issue). 

Algoritlim 1 A MATLAB implementation of MINLIP 

function a = MINLIP (u, y, d) 
n = length (y ) ; 

xl = toeplit z (u (d : end) , u (d : -1 : 1 ) ) ; 
y 1 = y (d : end) ; 
[ys, si] =sort (yl) ; 
xs = xl (si, : ) ; 
e = ones (n, 1 ) ; 

D = full (spdiags ( [-e e],[0 1 ] , n-d, n-d+1 ) ) ; 
a = quadprog (eye (d) , zeros (d, 1 ), -D*xs , -D*ys ) 



In a second phase, it could be useful to reconstruct 
/o based on ax and the samples {(a^Ut, j/j)}^^. We 
suggest to use the linear interpolation defined in ( [T2] i to 
proof Lemma 1 . This function is by construction Lips- 
chitz smooth with constant L — \J o^gt- It is not too 
difficult to come up with more parsimonious estimators 
of the univariate function /o based on the bivariate sam- 
ples {{aTpUt,yt)}J^ci which behaves more robust against 
modeling errors. 

2.3 Almost Consistency of MINLIP in 
the Noiseless Case 

This section characterizes how well the estimate approach 
the true impulse response, under suitable assumptions on 
the data and the static nonlinearity. Particularly, we as- 
sume that the observations arise from a true Monotone 
Wiener model with FIR system of given order for the dy- 
namic part, and that no noise perturbs the observations. 
This problem is non-trivial as (i) the proposed model 



is essentially nonlinear and non-parametric, and (ii) the 
method is not based directly on minimizing a mismatch 
between the model and the observations. The main out- 
come is that (approximate) consistent estimates are given 
when the data satisfies a condition of local Persistency of 
Excitation (PE), and the true monotone static nonlinearity 
is smooth around its steepest part. 

This result is referred to here as almost consistency, 
as it guarantees accuracy of the estimates only up to 
a small (but often non-zero) approximation term. In a 
sense, this is the best one could hope for here because 
of two reasons,: (a) the model is non-parametric (or semi- 
parametric) as the static monotone function of the model 
cannot be expressed straightforwardly in terms of a (small 
number of) parameters. In that respect, a finite dataset 
contains never have enough information in order to re- 
construct this system exactly, (b) A finite dataset can 
never be locally exciting in every (arbitrary small) neigh- 
borhood, but can only guarantee this condition for all lo- 
calities which are sufficiently large. Classical concepts as 
bias or variance do not cover this notion as no stochas- 
tic assumptions are made. The question wether approxi- 
mate consistency implies asymptotic consistency requires 
one as well to make additional stochastic assumptions un- 
derlying the data, and is not covered in this text as such. 
The analogue for linear estimating of a FIR model goes 
as follows: assume the system can be described exactly 
as a FIR model of given order (smaller than) d, then the 
corresponding notion is that the least squares estimate is 
exactly consistent if the data is (globally) PE to an order 
d. Since we assume that there is no noise in the data, there 
is no approximation to be made here. This difference can 
be seen as the consequence of the semi-parametric model 
of the Monotone Wiener system where in general no fi- 
nite/small parameterization exists matching the system. 

Assume that the observed system obeys the relation 



on the (infinite) set S as 



given in ( 32 1 with a fixed (but unknown) monotonically 
increasing function /o : M ^ M which is Lipschitz 
monotone with constant Lq < oo, and parameter vec- 
tor oq e M.'^. We refer to those as to the true function /g 
and the true parameters qq respectively. We address the 
question wether if we see enough data (or T — > oo), the 
MINLIP estimate ut will equal qq up to a scaling con- 
stant. Formally, we consider the MINLIP estimator based 



L 



min L 

L,||a||2 = l 



s.t. {y-y') < ia^(u-u'), V(u, y), (u', y') eS,y> y' . 



(19) 



Sometimes it will be convenient to rewrite the MINLIP 



estimator ( 19 1 as the following minimax problem: 
max inf "^("'"r 

||a||2 = l iii,y)AM'.y)eS:y>y' y-y' 



(20) 



where i > hy construction of Lq. In order to char- 
acterize the solution, the following two conditions are 
needed. 

Definition 5 (/ is (Lq, 5)-Lipschitz on S' C M) The 

function f is said to be {Lq, g)-Lipschitz on S' CM 
for a decreasing, positive function g : IR+ — > with 
g{0) — 1 if (A) one has for all z, z' G S' that 

{f{z)~ fiz'))<Lo{z-z'). (21) 

(B) there exists a z Cz S' and z' G S' such that 

(/(z) - /(z')) =U{z- z') , (22) 

and ( C) one has for this z, for any e > and z" G S' 
where \z — z"\ < e that 



\f{z)~fiz")\>gi\z-z"\)Lo\z~z"\. 



(23) 



Hence g denotes how 'smooth' the constant L decays in a 
neighborhood of z where the actual Lipschitz constraint 
is met (that is, a slower decaying function g indicates 
a higher smoothness). In particular, a value h{e) = 1 
implies that the function / is linear with slope Lq in 
this neighborhood. Such characterization is illustrated for 
f{z) = tanh(z) in Figure ^ for a smoothness function 
.g(e) - 1/(1 + ce). 

Definition 6 (e-Local Persistently Exciting) We say 

that a set S ^ M'' is e-local persistent exciting of order d 
for e > iff for any vector u G 5, there exist d vectors 
Ui, . . . , Urf G S with {u — Ufcj^j^ linearly independent 
vectors and 



Ufclla < e, Vfc = 1, 



(24) 
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Fig. 3: 5c/!eOTafic illustration of the [Lq, g)-Lipschitz 
property of a function f with g{e) =^ jif^- There 
should be a sample z where the Lipschitz constant 
Lq is attained, and in the e-neighborhood of this 
sample z the Lipschitz-property shouldn't decay 
too fast, e.g. in the neighborhood of z the func- 
tion behave almost linearly. 



This definition can be seen as a local version of Persis- 
tency of Excitation (PE), see e.g. ||8][TTl|4l for the classi- 
cal definition of PE. 

Theorem 1 (Almost Consistency) Fix e > and con- 
sider the {fo,ao)-Wiener system as in Q with corre- 
sponding observations in S. If fo : R ^ M. is {Lo,g)- 
Lipschitz and monotone on the set {{z,y) : z = ap^u : 
u e iS} and S is e-local PE, then 



a?ao > g(e), 
where is the estimate ofMINLIP as in (|76p. 



(25) 



This means that the smoother the function /o is towards 
its steepest part, the better estimates we get. In particular, 
when fo is (almost) linear - we only need global PE to 
have exact estimates ax oc aq. Specifically, if g{e) = 1 - 
meaning that the function h is linear - only global persis- 
tency of excitation is required to have consistency, and the 
MINLIP estimator will return the same result as a linear 
estimator. 

Proof: Let the Lipschitz constant be achieved in the sam- 
ple (u, y), (u', J/') S S, such that 



As the set S is e-local persistent exciting of order d, one 
can find for the sample (u, y) E S d vectors Ui , . . . , 
contained in 5 such that the vectors (u— Ui), . . . , (u— u^) 
are linearly independent and have norm smaller than e. 
This implies that one can rewrite aQ,aT G K'' (i.e. the 
true parameter vector and the optimal estimate associated 
to ([1^) as 



(27) 



where a,ao G 



and we let (u — u^) 



crfc(u 



Ufc)||(u - Uk)\\2 with o-fc e {-1,1} such that 

T 

||(u— Ufc)||2 — 1 and (u — u^) > for all k = 
1, . . . ,d. Define as previously for each k = 1, . . . ,d the 
constant Lk G M+ such that {y — yu) — Lk{u~ u/c)^ao, 
where Lk < Lq by construction. Now define the matrix 

A, e R'^'''^ as 



D„ = 





-ui)1 


(u 


- U2) 







(28) 



such that D^aQ > 0^, and the matrix L e R'^^'^ as L = 
diag(Li, . . . , Lrf). Then we have in matrix notation that 
ao = D^ao and ot — D^a. By construction of the 
MINLIP estimator we have that 

LDuDlao = LD^ao < W^ar = W^D^a, (29) 

where £ is the minimal value obtained in ([19]). As such 
jLDuD^ao < DuD^a. Then one has 

> 



g{e)Lo J, J, 



{y-y')=Loiu-u'y 



ao- 



(26) 



> gie)a^iD^Dl)ao > <?(e), (30) 
since £ < Lq. This proofs the result. 

3 IDENTIFICATION WITH NOISY DATA 

This section considers how to modify the estimator to- 
wards the case of noise being present in the data, or where 



the data-samples are only approximated by a monotone 
FIR (/, a)-system. Empirical evidence is provided for the 
use of this estimator. 

3.1 Model Class 

There are a number of different ways one can model noise 
in the class of monotone wiener systems. A first one is 
to consider noise on the measured outputs (or measure- 
ment noise). One may argue that this model is not a very 
realistic assumption in case the observations are a quan- 
tized version of the output of the linear system. That is, 
once the signal is quantized (and transmitted), it can of- 
ten be measured without error. On the other hand, it is 
often not clear which noise model of a quantized signal 
(with a finite number of different levels) fits the applica- 
tion (a Gaussian distribution would not make much sense 
here). Another assumption one can make is that noise oc- 
curs in the signal {ut}t, but as this results in coloring of 
the noise by the unknown linear subsystem, this model is 
not adopted as yet. A third alternative is that the noise 
comes in between the linear subsystem and the monotone 
static function (see Fig. |4]i. As in the following no re- 
strictive assumptions (as whiteness of the noise signal) is 
assumed, this could be seen as uncertainty coming in in 
the model by under-modeling of the linear system. This 
is the view which underlies the following definitions. 

Definition 7 (Noisy FIR Wiener Model (/, a)) A FIR 

Wiener model consists of a sequence of (i) a linear 
dynamical model characterized by an impulse response 
function Hi^q"^) applied on the input signal {ut]t, (H) ci 
static nonlinear function / : M — !■ M, and ( Hi) a sequence 
of 'noise' terms {et}t- If the signals {ut}t, {yt}t ond 
{et}t follow such a model with 'true' subsystem Hq and 
'true' function fo, we can write 



(31) 



If the signals {ut}t, {yt}t ond {et}t obey a Wiener FIR- 
model with 'true ' function /g and 'true ' parameters Gq, 
or 



Vt 



/o X! "'0,kUt-k + ef = /o(ao ut + et), (32) 
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Fig. 4: Schematic representation of a noisy monotone 
wiener system. This paper adopts the setting that 
noise comes in after the Unear dynamic part (cap- 
turing model mismatch), and right before appUca- 
tion of the static nonUnearity (or quantization). 



3.2 MINLIP for Noisy Data 

Given time-series {ut}t and {yt}t, referred to 
as 'input' and 'output'. Again, let {(u( = 
{ut-.d+i,---,ut),yt)}f^a C M'' X M be a dataset 
containing T — d + 1 samples. Let this set be rein- 
dexed as {(u(j), y(j))}Jl^ where 7/(i) < y/^^ for all 
d 



< T. Then adopting the noisy model ([32|) 



where uq S M'^ and ||ao||2 = 1- 



< I < J 

suggests modification of the standard MINLIP (see eq. 
([16)) as 



I r 1 I I 
mm -a a + -^\e^\ 

t=d 

s-t- {y{i)-y{i~i)) < (a^U(,;)+e(i))-(a'^U(i_i)+e(j_i)), 

Vi = d+l,...,r, (33) 

where the fixed regularization parameter 7 > trades the 
Lipschitz based regularization term and the penalization 
of the residuals. The choice of penalizing the absolute 
loss of the residuals is inspired by (i) robustness consid- 
erations and the (ii) non-stochastic nature of the residuals 
where a worst-case approach is more suited. The tuning 
of this constant can be done with an appropriate model 
selection criterion as cross-validation. As before, this op- 
timization problem can be solved efficiently as a convex 
quadratic program (QP) using 0{T) unknowns and in- 
equality constraints. 

4 Empirical Evidence 

This section spells out a number of artificial yet challeng- 
ing case studies. A main argument which is made here 
is that although the underlying system Hq may be repre- 



sented as a fractional polynomial, it is often useful to con- 
sider a FIR model consisting of a large number of tapped 
delays (say d = O(IOO)) which can approach (the im- 
pulse response function of) Hq arbitrarily close. In this 
way, one does not have to specify explicitly model or- 
der or delay of the model. We will refer to this approach 
as an over-parametrization. In order to choose the num- 
ber d of tapped delays in the FIR model, one may per- 
form a nonparametric analysis of the impulse response of 
the data (neglecting nonlinear effects as yet). It is found 
that MINLIP is especially appropriate for such an over- 
parametrization approach and doesn't loose much effi- 
ciency as it builds in explicitly a mechanism of model 
complexity control and regularization, dealing with ill- 
posedness problems often present in such a context. 

4.1 The Experimental setup 

The Monotone Wiener systems from which the data is 
generated take the following form. The linear subsystem 
Ho{q~^) are represented as fractional polynomial models 

as 

. -1, Bjq-^) ^ &0 + 6lg-^ + ---+b2m.g-^"^- 

(34) 

where 2mz > and 2mp > denote the orders of the 
polynomials A{q^^) and B{q^^). Those polynomials are 
chosen such that they have rriz and nip conjugate pairs of 
zeros and poles respectively. The zeros of A{q~'^) are re- 
ferred to as poles of Ho{q^^), and the zeros of B{q~^) are 
referred to as zeros of Ho{q^^). In this example, we set 
Uz = 2 and Up = 20. The conjugate poles and conjugate 
zeros are uniformly at random picked inside the unit cir- 
cle (see Figure [5]for an example). In general, we see that 
a FIR representation of c? = 200 is sufficient to capture 
the dynamics of such a system. The output nonlinearity is 
fixed as /o : M — > M where for a; G M one has 

fo{x) = 2 + tanh(5a; + 2) + 0.5tanh(5x-3). (35) 

This function is somewhat challenging as it cannot be de- 
scribed as a simple saturation function, is not symmet- 
ric around any point, and has an almost zero gradient in 
X — 0. Then a monotone Winer system is constructed as 

yt = foigHo{q-')ut), Vi = 1, . . . ,r, (36) 



where the gain g > is chosen such that the values 
{gHo{q~'^)ut}t have a unit standard deviation. The es- 
timates of MINLIP on a time-series of length T=450, 
500,550,600 - taken from the Wiener System of Fig. Q, 
Fig. (j6]) - is displayed in Fig. (j6]l. Here a FIR approxi- 
mation of d = 200 is used, capturing the dynamics of the 
system Hq reasonably well (see Fig. (j5]a)). 




Fig. 5: An example of a system Hq randomly generated, 
with Up = 20 and = 2. Panel (a) shows the 
resulting impulse response ( and hence a FIR ap- 
proximation) up to lag d — 200. Panel (b) dis- 
plays the conjugate poles and conjugate zeros in 
the complex domain using a pole-zero plot of the 
system. Observe ( i) the presence of a considerable 
(non-zero) delay, and (ii) the fact that some poles 
are located close to the unit circle. This makes a 
inverse modeling approach unfeasible. 

The following 6 different identification methods were 
implemented to benchmark the MINLIP against: 

1. (LS 'x — y') In order to provide a (naive) lower- 
bound to the performance, a FIR identification tech- 
nique based on a Least Squares (LS) argument was 
implemented on the signals {ut}t and {yt}t directly, 
neglecting the Wiener structure altogether. 

2. (LS 'x — z') In order to get an upper-bound on the 
performance of the identification technique, an ARX 
identification technique was implemented based on 
the (latent) intermediate signal {zt — Ho{q~^)ut}t. 
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Fig. 6: Evolution of the estimate of MINLIP when pro- 
vided by signals of length (a) 450, (b) 500, (c) 
550, (d) 600 samples, taken from the Wiener 
model described in Figure ([5]) and eq. ( [35| . A 
FIR model of d ~ 200 is used to approximate the 
linear system, taking care of the delay as well of 
the (unknown) model orders. 



3. (WPEM FIR) We consider a FIR model structure of 
sufficiently high order (here d 



200) such that ([34 1 
can be represented fairly well, and we let the corre- 
sponding FIR coefficients act directly as unknowns. 
The nonlinearity is represented as a piecewise func- 
tion based on 20 fixed knots which were optimally 
tuned to the example at hand. Global optimization 
on both FIR coefficients as well as on the unknowns 
of the nonlinearity is performed by the Broyden- 
Fletcher-Goldfarb-Shannon (BFGS) method imple- 
mented in MATLAB in the fminunc function. 

4. (WPEM ARX) Here we implement the same ap- 
proach now based on a class of ARX models repre- 



senting the optimal predictors corresponding to (34 1. 
Now, we let the poles and zeros of this model class 
act as unknowns directly, and As before, the non- 
linearity is expressed as a piecewise linear function 
with fixed grid points. Global optimization is per- 
formed by the BFGS method. 



5. (Greblicki2002) The smoothing approach as de- 
scribed in Q is implemented as well. This method 



works directly on a FIR overparameterization of the 
(linear sub-) system, and gives reasonable estimates 
when sufficiently many input-samples following a 
stochastic (approximately white) Gaussian process 
are provided. 

6. (Bai2006) The last approach we benchmark against 
is the technique described in |19, 2| using prior 
knowledge of the monotonicity of the output func- 
tion. This technique is implemented by solving 

T 

mm ^ (sign(y(j) - Vu-i)) ~ sign((u(j-) - U(j_i))'^a))' 
j=d+i 

(37) 

which is in our experiments solved by the BFGS 
method. We found that in order to make this ap- 
proach to work well one needs to resort to a smooth 
proxy 'sign' of the discrete function 'sign(z) = 
I{z > 0) — I{z < 0)', making gradient information 
available at most unknowns in the search space. In 
many cases solving this problem takes substantially 
more resources (CPU-power, memory) compared to 
the other techniques. 

Accuracy of an estimate is expressed in terms of the 
angle between the true impulse response of Hq and the 
impulse response of the estimate Ht- Let as classical the 
L2 norm of a system H be defined as 



(38) 



t=o 



where Sr is a time-series of all zeros except for the first 
location which equals one. Then the correlation of two 
systems Hq and Ht may be defined as 



||^^o||2||-ffT||2 



(39) 



If the systems Hq and Ht have impulse response vec- 
tor ho and h respectively, this coefficient can be written 
as the Pearson correlation coefficient between those two 
vectors, or - — . There are 3 reasons for adopting 

II 'lo II 2 II 'i II 2 

this definition. The first is that the gain of the system Hq 
is unidentifiable (hence cannot play a role in the quality 
measure), and the second one is that the impulse response 



is the common denominator for any LTI, independently 
of a parametrization. Thirdly, the current method concen- 
trates on first instance only at identification of the linear 
system, while making predictions requires additional esti- 
mation of the static nonlinearity. As such, measuring per- 
formance based on prediction accuracy would convolute 
the results with performance of this reconstruction step as 
well. 

4.2 A Noiseless Example 

The first experiment is based on noiseless data. Fig- 
ure (j7]i shows the results of the experiment, where in 
each iteration T samples are generated from a random 
system {fo.Ho), the different identification algorithms 
are carried out, and their respective accuracy is com- 
puted. This iteration is performed 100 times for any T — 
300, 400, 500, . . . , 1000. We see from the results that the 
MINLIP estimator converges fast to the best achievable 
performance (indicated by the LS 'x — z' approach). The 
WPEM algorithms give in many cases unreliable results, 
performing much worse on the average. Specifically, we 
find that it is bad practice to combine the WPEM on (over- 
parametrized) FIR model, perhaps because global opti- 
mization often presents (numerical) problems when opti- 
mizing over so large a set of parameters. MINLIP how- 
ever can handle such overparametrization quite efficiently, 
and does as such not require to determine the model or- 
ders and the delay of the system. This suggests the use of 
complexity control to give a principled tool to handle the 
task of model order selection. 

4.3 Quantized signals 

Here we investigate the use of MINLIP in case the out- 
put signal is quantized (i.e. takes a small number of dif- 
ferent values). This task poses additional challenges as 
there is a direct need to handle tied output values well. 
We adapt the following procedure: if — y{i) 

(tied), then we cannot compare the corresponding val- 
ues a-'"u(i_i) and a^u^ij unambiguously. Rather, we 
compare a^U(j_i') and a-^Uj-j) both with the samples 
{Uj^a^vLj) and {yk,a^vLk) which have a strictly lower 
value Dj < < yk- Again by transitivity of the 

relation < one can prune many of the relations in the fi- 
nal QP. In the worst case only 2 different output levels are 
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Fig. 7: Results of the first experiment using noiseless data 
generated from a monotone Wiener nonlinearity 
as in eq. ( |35| ) and systems Hq as in ( |34p . Perfor- 
mance expressed as correlations of Hq and Ht 
as in ( |39P are displayed for sample sizes rang- 
ing from T=210 to T=1000, using a FIR over- 
parametrization of d — 200. The vertical line 
denotes the place where a least squares technique 
would exactly reconstruct Hq if Zt ~ HQ{q^^)ut 
were given. 

observed on T samples, and both levels are each observed 
on 1^ samples. Then one has to work with ^ inequal- 
ity constraints rather than the 0{T) ones in the standard 
implementation. It may be argued that the theoretical ac- 
count for Monotone Wiener systems as given above does 
not hold strictly as the steepest part ('the jumps') have an 
unbounded Lipschitz constant. However, as the dataset 
is finite, this measure is necessarily finite and the method 
continuous to yield good solutions. If there are no samples 
present 'near the jumps', the analysis hold with an appro- 
priate function g dependent on this 'margin', and the size 
of the jumps. 

In this example we define the output nonlinearity for 

any z G M as 



/^(z)=/(z>-0.5)+/(z>2), 



(40) 



with the output taking values in the set {0, 1, 2}, and the 
jumps of the function occurring when z = —0.5 and 
z = 2. Again, the linear systems Hq used to generate the 



signals are as in eq. ( 34 1, and we benchmark the MINLIP 
against the approaches described in the previous subsec- 
tion (LS, WPEM and BAI). The results are displayed in 

Fig. m. 



From these results we may suggest a few guidelines. 
The first is that a naive LS 'x — y' regression works sur- 
prisingly good, and it is not at all trivial to beat this one. 
The reason the approaches based on global optimization 
(i.e. WPEM, WPEM.FIR and Bai2006) do not work as 
good might be that the discrete nature of the identifica- 
tion task translates in a highly non-smooth cost surface 
given to the optimizer This experiment however suggests 
that MINLIP achieves a solution which is often close to 
the best one could hope for (indicated by the LS — z' 
method). The averaging approach proposed in |5 1 appears 
fairly robust to the quantization effects as well. 




Fig 



8: Results of the quantization experiment using 
noiseless data generated from a monotone Wiener 
nonlinearity as in eq. ( |4Qp and systems Hq as in 
p4|. Performance expressed as correlations of 



Hq and Ht as in (39\ are displayed for sample 
sizes ranging from T ~ 210 to T — 1000, using a 
FIR overparameterization of d ^ 200. 



4.4 The noisy case 

This subsection reports results achieved with MINLIP in 
case the intermediate signal {zt]t is perturbed by noise. 
The experiment is set up as before in Subsection 2.3, but 
the system becomes now 



yt 



fo {Ho{q-')ut + et) , yt ^ 1, 



T 



(41) 



where /o is as in eq. ([35]l and Hq is randomly generated 



as in (34 1. The terms {et\t are zero mean white Gaussian 
noise with standard deviation > 0. This experiment 
is conceived in a slightly different manner than before. 



We consider a fixed number of samples T = 500, and let 
the Signal-to-Noise Ratio (SNR) vai-y from 0.1 to 10 (or 
(Te = 0.1, ... , 10). As indicated in Section III, MINLIP 
is dependent on the choice of a suitable 7 > 0, which is 
in turn depending on the noise variance a^. As this char- 
acteristic is unknown in general in practical applications, 
the choice of 7 is to be made based on a suitable model 
selection technique. Actually, the problem of model se- 
lection in the context of a Wiener model is not covered as 
such, and prompts new questions related to information 
criteria, stability and consistency. For now, we use a fixed 
value of 7 = 10 which works well in many cases. The 
results are displayed in graph (j9]l. Those results indicate 
that the MINLIP outperforms the other techniques espe- 
cially when noise is small compared to the 'informative' 
signal, while all techniques become arbitrarily bad when 
this ratio grows. 

In the next experiment fix an SNR of 3 and lets see what 
happens to the performance of the different estimators if 
the given signals have an increasing length. The evolu- 
tion of the average accuracy of MINLIP and the compet- 
ing estimators is given in Fig. ( 10 1. From this result we 



see that the behavior is not too different from the noise- 
less case, except the fact that the WPEM and WPEM.FIR 
approaches are not very robust to noise, and the approach 



in ( 37 1 is clearly a bad choice in this case and needs addi- 
tional care. 



5 DISCUSSION 

This paper studies how MINLIP works for identification 
of monotone Wiener systems. Theoretical as well as 
empirical evidence indicates the use of the estimate de- 
spite its unconventional groundings. Especially, one of 
the points is that this method based on model complex- 
ity control can handle FIR overparameterizations of the 
linear subsystem quite efficiently, implementing implic- 
itly model order- and delay- estimation during the iden- 
tification task. The crux of the method is to place model 
complexity control in the centre of the identification task. 
The hope is that this line of thinking provides novel ideas 
which are useful in the design and analysis of identifica- 
tion algorithms for more general nonlinear systems. A 
main open question is a theoretical study of the influence 
of noise in the almost consistency result. 
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Results of the third experiment using noisy data 
generated from a monotone Wiener nonlinearity 
as in eq. \35\ and systems Hq as in ( |34| ). Perfor- 
mance expressed as correlations of Hq and Ht 
as in ( |39| ) are displayed for a sample of sizes 
T — 500. The amount of noise varies from 
CTe = 0.1 to (Te = 3, where the linear system Hq 
is rescaled such that Uz = ^ ( corresponding with 
SNRfrom 10 to 0.66). MINLIP was implemented 
with a fixed 7 = 10, and can be seen to outper- 
form other methods especially when the SNR is 
relatively high. 
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10: Results of the last experiment using noisy data 
generated from a monotone Wiener nonlinearity 
as in eq. ( [35| and systems Hq as in ( |i4) ), and 
SNR equal to 3. Performances of the different 
estimators are ex pres sed as X— correlations of 
Hq and Ht as in (39 L The sample sizes ranges 
from T 



210 toT ^ 1000, using a FIR over- 
parametrization of d — 200. 
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